A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

نویسندگان

Long Yang

Minhao Shi

Qian Zheng

Wenjia Meng

Gang Pan

چکیده

Recently, a new multi-step temporal learning algorithm, called Q(σ), unifies n-step Tree-Backup (when σ = 0) and n-step Sarsa (when σ = 1) by introducing a sampling parameter σ. However, similar to other multi-step temporal-difference learning algorithms, Q(σ) needs much memory consumption and computation time. Eligibility trace is an important mechanism to transform the off-line updates into efficient on-line ones which consume less memory and computation time. In this paper, we further develop the original Q(σ), combine it with eligibility traces and propose a new algorithm, called Q(σ, λ), in which λ is trace-decay parameter. This idea unifies Sarsa(λ) (when σ = 1) and Q(λ) (when σ = 0). Furthermore, we give an upper error bound of Q(σ, λ) policy evaluation algorithm. We prove thatQ(σ, λ) control algorithm can converge to the optimal value function exponentially. We also empirically compare it with conventional temporal-difference learning methods. Results show that, with an intermediate value of σ, Q(σ, λ) creates a mixture of the existing algorithms that can learn the optimal value significantly faster than the extreme end (σ = 0, or 1).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bidding Strategy on Demand Side Using Eligibility Traces Algorithm

Restructuring in the power industry is followed by splitting different parts and creating a competition between purchasing and selling sections. As a consequence, through an active participation in the energy market, the service provider companies and large consumers create a context for overcoming the problems resulted from lack of demand side participation in the market. The most prominent ch...

متن کامل

Using Sliding Mode Controller and Eligibility Traces for Controlling the Blood Glucose in Diabetic Patients at the Presence of Fault

Some people suffering from diabetes use insulin injection pumps to control the blood glucose level. Sometimes, the fault may occur in the sensor or actuator of these pumps. The main objective of this paper is controlling the blood glucose level at the desired level and fault-tolerant control of these injection pumps. To this end, the eligibility traces algorithm is combined with the sliding mod...

متن کامل

Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

Temporal-difference (TD) learning is an important field in reinforcement learning. Sarsa and Q-Learning are among the most used TD algorithms. The Q(σ) algorithm (Sutton and Barto (2017)) unifies both. This paper extends the Q(σ) algorithm to an online multi-step algorithm Q(σ, λ) using eligibility traces and introduces Double Q(σ) as the extension of Q(σ) to double learning. Experiments sugges...

متن کامل

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Temporal diierence (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Q-learning, may be viewed as instances of TD learning. This p...

متن کامل

An Introduction to Temporal Difference Learning

Temporal Difference learning is one of the most used approaches for policy evaluation. It is a central part of solving reinforcement learning tasks. For deriving optimal control, policies have to be evaluated. This task requires value function approximation. At this point TD methods find application. The use of eligibility traces for backpropagation of updates as well as the bootstrapping of th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1802.03171 شماره

صفحات -

تاریخ انتشار 2018

A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning

نویسندگان

چکیده

منابع مشابه

Bidding Strategy on Demand Side Using Eligibility Traces Algorithm

Using Sliding Mode Controller and Eligibility Traces for Controlling the Blood Glucose in Diabetic Patients at the Presence of Fault

Double Q($\sigma$) and Q($\sigma, \lambda$): Unifying Reinforcement Learning Control Algorithms

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

An Introduction to Temporal Difference Learning

عنوان ژورنال:

اشتراک گذاری